Deriving the Hat Matrix and QR Decomposition

Dr. Lucy D’Agostino McGowan

Deriving the Hat Matrix

The Sum of Squared Errors

Recall our objective: \[\text{SSE} = (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\]

Goal: Find \(\hat{\boldsymbol{\beta}}\) that minimizes SSE

Method: Take the derivative with respect to \(\boldsymbol{\beta}\) and set equal to zero

Expanding the SSE Expression

Start with: \[\text{SSE} = (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\]

Expand: \[\text{SSE} = \mathbf{y}^T\mathbf{y} - \mathbf{y}^T\mathbf{X}\boldsymbol{\beta} - \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{y} + \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}\]

Expanding the SSE Expression

Since \(\mathbf{y}^T\mathbf{X}\boldsymbol{\beta} = \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{y}\) (both are scalars): \[\text{SSE} = \mathbf{y}^T\mathbf{y} - 2\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{y} + \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}\]

Taking the Derivative

Matrix calculus rules we need:

  • \(\frac{\partial}{\partial \boldsymbol{\beta}}(\mathbf{a}^T\boldsymbol{\beta}) = \mathbf{a}\)
  • \(\frac{\partial}{\partial \boldsymbol{\beta}}(\boldsymbol{\beta}^T\mathbf{A}\boldsymbol{\beta}) = 2\mathbf{A}\boldsymbol{\beta}\) (when \(\mathbf{A}\) is symmetric)

Taking the derivative: \[\frac{\partial \text{SSE}}{\partial \boldsymbol{\beta}} = -2\mathbf{X}^T\mathbf{y} + 2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}\]

Setting Equal to Zero

Set the derivative equal to zero: \[-2\mathbf{X}^T\mathbf{y} + 2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} = \mathbf{0}\]

Divide by 2: \[-\mathbf{X}^T\mathbf{y} + \mathbf{X}^T\mathbf{X}\boldsymbol{\beta} = \mathbf{0}\]

Setting Equal to Zero

Rearrange: \[\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} = \mathbf{X}^T\mathbf{y}\]

The Least Squares Solution

Solve for \(\hat{\boldsymbol{\beta}}\): \[\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\]

Verification in R

set.seed(1)
x <- 1:10
y <- 2 + 3 * x + rnorm(10, 0, 2)

X <- cbind(1, x)

beta_hat <- solve(t(X) %*% X) %*% t(X) %*% y

model <- lm(y ~ x)
cbind(beta_hat, coef(model))
      [,1]     [,2]
  1.662353 1.662353
x 3.109464 3.109464

QR Decomposition

What is QR Decomposition?

Any matrix \(\mathbf{X}\) can be decomposed as: \[\mathbf{X} = \mathbf{Q}\mathbf{R}\]

Where:

  • \(\mathbf{Q}\) is orthogonal: \(\mathbf{Q}^T\mathbf{Q} = \mathbf{I}\)
  • \(\mathbf{R}\) is upper triangular

What is QR Decomposition?

Any matrix \(\mathbf{X}\) can be decomposed as: \[\mathbf{X} = \mathbf{Q}\mathbf{R}\]

In words: Break down \(\mathbf{X}\) into perpendicular directions (\(\mathbf{Q}\)) and scaling/rotation (\(\mathbf{R}\))

Why Use QR for Regression?

Traditional approach requires: \[\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\]

Why Use QR for Regression?

Problems with \((\mathbf{X}^T\mathbf{X})^{-1}\):

  • Computing \(\mathbf{X}^T\mathbf{X}\) can be numerically unstable
  • Matrix inversion is computationally expensive
  • Condition number gets squared: \(\kappa(\mathbf{X}^T\mathbf{X}) = \kappa(\mathbf{X})^2\)

Demo

What do I mean by “unstable”

x <- seq(1, 500, len = 30)
X <- cbind(1, x, x^2, x^3)
beta <- matrix(c(1, 1, 1, 1), 4, 1)

set.seed(1)
y <- X%*%beta + rnorm(30)
solve(crossprod(X))

Error in solve.default(crossprod(X)) : system is computationally singular: reciprocal condition number = 3.11914e-17

log10(crossprod(X))
                   x                    
  1.477121  3.875929  6.406189  8.987506
x 3.875929  6.406189  8.987506 11.596810
  6.406189  8.987506 11.596810 14.223800
  8.987506 11.596810 14.223800 16.862984

QR Solution

If \(\mathbf{X} = \mathbf{Q}\mathbf{R}\), then: \[\mathbf{X}^T\mathbf{X} = (\mathbf{Q}\mathbf{R})^T(\mathbf{Q}\mathbf{R}) = \mathbf{R}^T\mathbf{Q}^T\mathbf{Q}\mathbf{R} = \mathbf{R}^T\mathbf{R}\]

QR Solution

So: \[ \begin{align} \hat{\boldsymbol{\beta}} &= (\mathbf{R}^T\mathbf{R})^{-1}\mathbf{R}^T\mathbf{Q}^T\mathbf{y} \end{align} \] ::: fragment Note that \((\mathbf{AB})^{-1}=\mathbf{B}^{-1}\mathbf{A}^{-1}\) as long as both \(\mathbf{A}\) and \(\mathbf{B}\) are invertible (so if \(\mathbf{R}\) is invertible then \((\mathbf{R}^T\mathbf{R})^{-1} = \mathbf{R}^{-1}(\mathbf{R}^T)^{-1}\)) :::

QR Solution

So: \[ \begin{align} \hat{\boldsymbol{\beta}} &= (\mathbf{R}^T\mathbf{R})^{-1}\mathbf{R}^T\mathbf{Q}^T\mathbf{y} \\&= \mathbf{R}^{-1}(\mathbf{R}^T)^{-1}\mathbf{R}^T\mathbf{Q}^T\mathbf{y}\\& = \mathbf{R}^{-1}\mathbf{Q}^T\mathbf{y} \end{align} \]

QR Advantages

  • Only need to solve \(\mathbf{R}\hat{\boldsymbol{\beta}} = \mathbf{Q}^T\mathbf{y}\) (back substitution)
  • More numerically stable
  • Avoids explicit matrix inversion

Backsolve Example

Given: \[\mathbf{R} = \begin{bmatrix} 2 & 1 & 3 \\ 0 & 4 & 2 \\ 0 & 0 & 5 \end{bmatrix}, \quad \mathbf{Q}^T\mathbf{y} = \begin{bmatrix} 8 \\ 12 \\ 15 \end{bmatrix}\]

Solve: \(\mathbf{R}\hat{\boldsymbol{\beta}} = \mathbf{Q}^T\mathbf{y}\)

Step-by-Step Backsolve

\[\begin{bmatrix} 2 & 1 & 3 \\ 0 & 4 & 2 \\ 0 & 0 & 5 \end{bmatrix} \begin{bmatrix} \hat\beta_1 \\ \hat\beta_2 \\ \hat\beta_3 \end{bmatrix} = \begin{bmatrix} 8 \\ 12 \\ 15 \end{bmatrix}\]

Step 1: Solve bottom row first \[5\hat\beta_3 = 15 \implies \hat\beta_3 = 3\]

Step-by-Step Backsolve

\[\begin{bmatrix} 2 & 1 & 3 \\ 0 & 4 & 2 \\ 0 & 0 & 5 \end{bmatrix} \begin{bmatrix} \hat\beta_1 \\ \hat\beta_2 \\ \hat\beta_3 \end{bmatrix} = \begin{bmatrix} 8 \\ 12 \\ 15 \end{bmatrix}\]

Step 2: Substitute into second row \[4\beta_2 + 2(3) = 12 \implies 4\beta_2 = 6 \implies \beta_2 = 1.5\]

Step-by-Step Backsolve

\[\begin{bmatrix} 2 & 1 & 3 \\ 0 & 4 & 2 \\ 0 & 0 & 5 \end{bmatrix} \begin{bmatrix} \hat\beta_1 \\ \hat\beta_2 \\ \hat\beta_3 \end{bmatrix} = \begin{bmatrix} 8 \\ 12 \\ 15 \end{bmatrix}\]

Step 3: Substitute into first row \[2\beta_1 + 1(1.5) + 3(3) = 8 \implies 2\beta_1 = -2.5 \implies \beta_1 = -1.25\]

Backsolve Solution

\[\hat{\boldsymbol{\beta}} = \begin{bmatrix} -1.25 \\ 1.5 \\ 3 \end{bmatrix}\]

Key insight: Work backwards through the triangular matrix, using previously solved values to eliminate unknowns in each step.

You Try: QR Properties

Given \(\mathbf{X} = \mathbf{Q}\mathbf{R}\), verify that:

  1. \(\mathbf{X}^T\mathbf{X} = \mathbf{R}^T\mathbf{R}\)
  2. The hat matrix becomes \(\mathbf{H} = \mathbf{Q}\mathbf{Q}^T\)

Hint: Use the fact that \(\mathbf{Q}^T\mathbf{Q} = \mathbf{I}\) and remember \((\mathbf{AB})^{-1}=\mathbf{B}^{-1}\mathbf{A}^{-1}\) as long as both \(\mathbf{A}\) and \(\mathbf{B}\) are invertible.

05:00

QR in R

# Using our previous data

# QR decomposition
qr_decomp <- qr(X)
Q <- qr.Q(qr_decomp)
R <- qr.R(qr_decomp)

# Solve using QR
beta_qr <- backsolve(R, t(Q) %*% y)
beta_qr

Why QR Matters

Condition number comparison:

# Compare condition numbers
kappa(X)               # Condition of X
[1] 158915720
kappa(t(X) %*% X)      # Condition of X'X (much worse!)
[1] 3.165203e+16

You Try: QR Properties

Calculate \(\hat{y}\) using the QR elements.

05:00